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(57)Abstract: 

PROBLEM TO BE SOLVED: To automatically extract a relative 
keyword which is matched with the characteristics of a document to 
be practically retrieved and which is capable of obtaining one or more 
retrieval results at the time of executing retrieval using the keyword. 
SOLUTION: An automatic extraction device for relative keywords is 
provided with a document set selection part 19 for specifying a partial 
set of each document based on the attribute information, input 
retrieval expression, etc., of the document, a word statistic information 
management part 1 7 for managing the statistic information of 
respective words in the whole objective document 1 1 and words 
appearing in each document as well as their statistic information 15; 
and a word ranking part 18 for calculating the importance of each word 
appearing in a partial set of a certain document and for aligning 
respective words in the order of importance, wherein the management 
part 1 7 quickly finds out the statistic information of respective words 
in the whole document and a specified partial set of the document- 
Consequently, words appearing in a certain document set can be 
ranked based on their importance and a part of the ranked words can 
be presented as a relative keyword. 
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* NOTICES * 

Japan Patent Office is not responsible .for any 
damages caused by the use of this translation. 

1. This document has been translated by computer. So the translation may not reflect the original precisely. 

2. **** shows the word which can not be translated. 
3.1n the drawings, any words are not translated. 



CLAIMS 



[Claim(s)] 

[Claim 1] Related keyword automatic-extracting equipment which is equipped with the following, extracts the 
group of a word or a word, and its significance, and is characterized by making it show in the form which can 
reuse this only about the particular part of the word groups which aligned. The document set selection section 
which specifies the subset of a document based on the reference formula which the attribute information by 
which statistical information, such as the frequency of occurrence of the group of the word which appears in 
each document of an object document set using a dictionary, or a word, and a distribution, was given to each 
document to the document set currently extracted beforehand, and the user inputted The word statistical 
information Management Department which manages the statistical information in the whole object document 
of each word and the word which appears in the document concerned for every document, and its statistical 
information The word ranking section which computes the significance of each word which appears in the 
subset specified based on the whole sentence document of each word, and the statistical information for every 
document, and aligns in order of significance 

[Claim 2] The statistical information of the word which appears in the document group contained in Subset A 
when the subset B contained in this is specified by the document set selection section to the specified subset 
A in the aforementioned composition, Related keyword automatic-extracting equipment according to claim 1 
characterized by computing the significance of each word which appears in Subset B, and being reflected in 
word ranking by seasoning the significance of each word in Subset B with difference with the statistical 
information of the word which appears in the document group contained in Subset B. 
[Claim 3] Related keyword automatic-extracting equipment according to claim 1 or 2 characterized by 
computing the significance of the word concerned and being reflected in word ranking by seasoning the 
significance of the word which prepares the function which gives the weight of each document to the document 
set selection section, and is contained in each document of the specified document set with the weight of the 
document concerned. 

[Claim 4] Related keyword automatic-extracting equipment according to claim 1 to 3 characterized by the 
ability to sort out only a word with high effectiveness in the case of reuse by excepting, the word whose 
appearance degree is high frequency or low frequency in the whole object document set in consideration of the 
threshold which was able to be defined beforehand. 

[Claim 5] Related keyword automatic-extracting equipment according to claim 4 characterized by the ability to 
sort out only a word with high effectiveness in the case of reuse by changing the threshold for exclusion 
according to characteristic quantity of the word, such as the length of a word. 

[Claim 6] Related keyword automatic-extracting equipment according to claim 1 to 5 characterized by having 
the appearance Research and Data Processing Department which manages the information on the appearance 
position of a word, or the appearing context, computing the significance of the word concerned by considering 
the weight beforehand set to the significance of a word according to the kind of appearance information on the 
word, and being reflected in word ranking. 

[Claim 7] Related keyword automatic-extracting equipment according to claim 1 to 6 characterized by for the 
part of speech of a word etc. having the language attribute Management Department which manages the 
attribute information on each word, computing the significance of the word concerned by considering the 
weight beforehand defined according to the attribute of the word concerned, and being reflected in word 
ranking. 

[Claim 8] The inclusion relation as a character string between the extracted words or the word group specified 
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beforehand, and the extracted word When judged with having the character string inclusion relation judging 
section judged according to the defined conditions, and the words concerned having the inclusion relation as a 
character string The specified conditions are followed, only the character string of a long unit only the 
character string of a short unit Only a character string with a higher significance or by choosing whether they 
are the both sides of the difference of the character string of a short unit and the character string of a long 
unit, and the character string of a short unit, and ******** Related keyword automatic-extracting equipment 
according to claim 1 to 7 characterized by the ability to sort out only a word with high effectiveness in the 
case of reuse. 

[Claim 9] Related keyword automatic-extracting equipment according to claim 1 to 8 characterized by the 
ability to classify and show the word extracted by the part of speech of a word etc. having the language 
attribute Management Department which manages the attribute information on each word, and taking into 
consideration the frequency of occurrence in the attribute of the word concerned, and the specified whole 
subset or a whole document, a distribution, etc. 

[Claim 10] Related keyword automatic-extracting equipment according to claim 9 with which only the 
representation word group which prepares the representation word grant section which gives the word 
representing the set about each of the classified word group, and represents the classified word group is 
characterized by the ability to show a representation word and all words. 

[Claim 1 1] As opposed to the document set from which statistical information, such as the frequency of 
occurrence of the group of the word which appears in each document of an object document set using a 
dictionary, or a word, and a distribution, is extracted beforehand The reference condition input section which 
inputs conditional expression required for a document retrieval, and the document-retrieval section which 
searches a document from an object document set according to the inputted reference conditions, About the 
document searched in the document-retrieval section 45, have the document ranking section 46 which 
calculates the goodness of fit between the reference formulas and documents which were inputted, and it 
changes. Document-retrieval equipment which can input into the reference condition input section the related 
keyword which sent the ranking result in the document ranking section to related keyword automatic- 
extracting equipment, and was fed back from related keyword automatic-extracting equipment 
[Claim 1 2] It is document-retrieval equipment possible in having the document-retrieval section which 
searches a document from an object document set according to the reference conditions inputted as the 
reference condition input section which inputs conditional expression required for a document retrieval, 
changing, and the aforementioned reference condition input section inputting considering the related keyword in 
which the user has been seen off from related keyword automatic-extracting equipment in addition to inputting 
reference conditions as reference conditions. 

[Claim 13] As opposed to the document set from which statistical information, such as the frequency of 
occurrence of the group of the word which appears in each document of an object document set using a 
dictionary, or a word, and a distribution, is extracted beforehand The reference condition input section which 
inputs conditional expression required for a document retrieval, and the document-retrieval section which 
searches a document from an object document set according to the inputted reference conditions, The 
document-retrieval equipment which has the document ranking section 46 which calculates the goodness of fit 
between the reference formulas and documents which were inputted, and changes about the document 
searched in the document-retrieval section 45, It consists of related keyword automatic-extracting equipment 
connected to the aforementioned document-retrieval equipment. The ranking result outputted from the 
document ranking section of the aforementioned document-retrieval equipment is sent to related keyword 
automatic-extracting equipment. Moreover, the document-retrieval system characterized by feeding back a 
related keyword to the reference condition input section of related keyword automatic-extracting equipment to 
document-retrieval equipment, and performing retrieval by keyword. 

[Claim 14] The ranking result which the document set selection section was prepared between document- 
retrieval equipment and related keyword automatic-extracting equipment, and was outputted from the 
document ranking section of document-retrieval equipment is a document-retrieval system according to claim 
1 3 characterized by to be sent to the document set selection section, to perform specification of a document, 
and to input into the aforementioned related keyword automatic-extracting equipment 48 the subset of the 
document which the document set selection section 47 specified. 

[Claim 15] The document-retrieval system according to claim 13 or 14 characterized by using related keyword 
automatic-extracting equipment according to claim 1 to 1 0 for related keyword automatic-extracting 
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equipment. 

[Claim 16] The document-retrieval equipment which has the document-retrieval section which searches a 
document from an object document set according to the reference conditions inputted as the reference 
condition input section which inputs conditional expression required for a document retrieval, and changes, It 
consists of related keyword automatic-extracting equipment connected to the aforementioned document- 
retrieval equipment the reference condition input section of the aforementioned document-retrieval equipment 
The document-retrieval system characterized by inputting the related keyword which has been sent from 
related keyword automatic-extracting equipment in addition to a user inputting reference conditions as 
reference conditions, and performing retrieval by keyword. 

[Claim 17] The document-retrieval system according to claim 16 characterized by using related keyword 
automatic-extracting equipment according to claim 1 to 10 for related keyword automatic-extracting 
equipment. 



[Translation done.] 
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* NOTICES * 

Japan Patent Office is not respons i b I e f or any 
damages caused by the use of this trans I at i n. 

1 .This document has been translated by computer. So the translation may not reflect the original precisely. 
2.**** shows the word which can not be translated. 
3.1n the drawings, any words are not translated. 



DETAILED DESCRIPTION 



[Detailed Description of the Invention] 
[0001] 

[Industrial Application] this invention relates to the related keyword automatic-extracting equipment for 
extracting as a keyword the phrase by which the document set is characterized from a specific document set, 
and the document-retrieval equipment using the aforementioned related keyword automatic-extracting 
equipment. 
[0002] 

[Description of the Prior Art] In document-retrieval equipment, although it is necessary to input the reference 
formula using the suitable search term in order to obtain the document which a user needs, there is a problem 
that the user itself cannot recollect a suitable search term easily. Then, the technique of helping re-reference 
of a user etc. has been taken by showing the word relevant to a search term conventionally to the search term 
which the user inputted using a related-term dictionary etc. However, in order to depend for such technique on 
the property of a ** useless **** related-term dictionary statically beforehand, the related term adapted to 
the property of the document used as the candidate for reference is not obtained. Moreover, there was a fault 
that it was not guaranteed that at least one or more documents are obtained as a result of referring to the 
obtained word. 
[0003] 

[Problem(s) to be Solved by the Invention] Statistical information, such as the frequency of occurrence, a 
distribution, etc. of each word in the document set which this invention solves the aforementioned technical 
problem and was specified, By computing the significance of a word in consideration of the statistical 
information of the word in the whole document for reference, carrying out ranking of the word with the 
significance based on this, and extracting the word group which is a part of rank It aims at offering the related 
keyword automatic-extracting equipment which is based on the property of the actual document for reference, 
and can extract the high related keyword group of quality at high speed and dynamically. 
[0004] Moreover, when reference is performed using the related keyword group obtained from the 
aforementioned related keyword automatic-extracting equipment, it aims at offering the document-retrieval 
system using the document-retrieval equipment and these which guarantee that at least one or more reference 
results are obtained. 
[0005] 

[Means for Solving the Problem] this invention in order to attain the above-mentioned purpose as related 
keyword automatic-extracting equipment The document set selection section which specifies the subset of a 
document based on the reference formula which the attribute information given to each document and the user 
inputted, With the word which appears for every statistical information in the whole object document of each 
word, or document, and the word statistical information Management Department which manages the statistical 
information The word ranking section which computes the significance of each word which appears in the 
subset of the document specified based on the whole sentence document of each word or each statistical 
information in a document, and aligns in order of significance is prepared, by the word statistical information 
Management Department It is possible to ask for the whole document and the statistical information of each 
word in the specified document subset at high speed, ranking of each word which appears in the specified 
document set can be carried out at high speed in order of the significance, and the part can be shown as a 
related keyword. 

[0006] In the aforementioned composition, furthermore, by in addition, the thing for which the word which the 
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weight of a word is changed or fulfills specific conditions from the word group after ranking by establishing a 
means to manage the appearance position in the attribute information on a .word or a document etc. is deleted 
More intelligible related keyword presentation can be performed by classifying the word group which the 
precision as a related term of the word group extracted could be raised, and was extracted according to the 
attribute and statistical property of a word. 

[0007] Moreover, by this invention's constituting the document-retrieval system which contains the document- 
retrieval equipment which cooperated with related keyword automatic-extracting equipment in order to attain 
the above-mentioned purpose, and reusing the extracted related keyword as an input The extracted related 
keyword suits the property of an object document, and if the candidate for reference is the same document 
group, since it is secured that at least one or more reference results are obtained by the keyword, re- 
reference can be performed efficiently and easily. 
[0008] 

[Embodiments of the Invention] As opposed to the document set from which statistical information, such as 
the frequency of occurrence of the group of the word to which invention of this invention according to claim 1 
appears in each document of an object document set using a dictionary, or a word, and a distribution, is 
extracted beforehand The document set selection section which specifies the subset of a document based on 
the reference formula which the attribute information given to each document and the user inputted, With the 
statistical information in the whole object document of each word and the word which appears in the document 
concerned for every document, and the word statistical information Management Department which manages 
the statistical information It has the word ranking section which computes the significance of each word which 
appears in the subset specified based on the whole sentence document of each word, and the statistical 
information for every document, and aligns in order of significance. Only about the particular part of the word 
groups which aligned, the group of a word or a word, and its significance is extracted, and it has operation of 
carrying out high-speed presentation in the form which can reuse this. 

[0009] Invention of this invention according to claim 2 is set to related keyword automatic-extracting 
equipment according to claim 1. The statistical information of the word which appears in the document group 
contained in Subset A when the subset B contained in this is specified by the document set selection section 
to the specified subset A, The significance of each word which appears in Subset B is computed, and it is made 
to be reflected in word ranking by seasoning the significance of each word in Subset B with difference with the 
statistical information of the word which appears in the document group contained in Subset B. 
[0010] Invention of this invention according to claim 3 computes the significance of the word concerned, and it 
is made to reflect it in word ranking in related keyword automatic-extracting equipment according to claim 1 or 
2 by seasoning the significance of the word which prepares the function which gives the weight of each 
document to the document set selection section, and is contained in each document of the specified document 
set with the weight of the document concerned. 

[001 1] Invention of this invention according to claim 4 enables it to sort out only a word with high effectiveness 
in related keyword automatic-extracting equipment according to claim 1 to 3 in the case of reuse by excepting 
the word whose appearance degree is high frequency or low frequency from the object of related keyword 
extraction in the whole object document set in consideration of the threshold which was able to be defined 
beforehand. 

[0012] Invention of this invention according to claim 5 enables it to sort out only a word with high effectiveness 
in related keyword automatic-extracting equipment according to claim 4 in the case of reuse by changing the 
threshold for exclusion according to characteristic quantity of the word, such as the length of a word. 
[001 3] Invention of this invention according to claim 6 has the appearance Research and Data Processing 
Department which manages the information on the appearance position of a word, or the appearing context, 
and it computes the significance of the word concerned and it is made to reflect it in word ranking in related 
keyword automatic-extracting equipment according to claim 1 to 5 by considering the weight beforehand set to 
the significance of a word according to the kind of appearance information on the word. 

[0014] The part of speech of a word etc. has the language attribute Management Department which manages 
the attribute information on each word, and invention of this invention according to claim 7 computes the 
significance of the word concerned, and it is made to reflect it in word ranking by considering the weight 
beforehand defined according to the attribute of the word concerned in related keyword automatic-extracting 
equipment according to claim 1 to 6. 

[0015] Invention of this invention according to claim 8 is set to related keyword automatic-extracting 
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equipment according to claim 1 to 7. The inclusion relation as a character string between the extracted words 
or the word group specified beforehand, and the extradted word When judged with having the character string 
inclusion relation judging section judged according to the defined conditions, and the words concerned having 
the inclusion relation as a character string The specified conditions are followed, only the character string of a 
long unit only the character string of a short unit Or only a character string with a higher significance enables it 
to sort out only a word with high effectiveness in the case of reuse by choosing whether they are the both 
sides of the difference of the character string of a short unit and the character string of a long unit, and the 
character string of a short unit, and ********. 

[0016] Invention of this invention according to claim 9 classifies the extracted word, and enables it to show it in 
related keyword automatic-extracting equipment according to claim 1 to 8 by the part of speech of a word etc. 
having the language attribute Management Department which manages the attribute information on each word, 
and taking into consideration the frequency of occurrence in the attribute of the word concerned, and the 
specified whole subset or a whole document, a distribution, etc. 

[0017] Only the representation word group which prepares the representation word grant section which gives 
the word representing the set about each of the classified word group, and represents the classified word 
group enables it, as for invention of this invention according to claim 10, to show a representation word and all 
words in related keyword automatic-extracting equipment according to claim 9. 

[0018] As opposed to the document set from which statistical information, such as the frequency of 
occurrence of the group of the word to which invention of this invention according to claim 1 1 appears in each 
document of an object document set, using a dictionary as document-retrieval equipment, or a word, and a 
distribution, is extracted beforehand The reference condition input section which inputs conditional expression 
required for a document retrieval, and the document-retrieval section which searches a document from an 
object document set according to the inputted reference conditions, It has the document ranking section which 
calculates the goodness of fit between the reference formulas and documents which were inputted about the 
document searched in the document-retrieval section. It has operation of inputting into the reference condition 
input section the related keyword which sent the ranking result in the document ranking section to related 
keyword automatic-extracting equipment, and was fed back from related keyword automatic-extracting 
equipment. 

[0019] The reference condition input section which inputs the conditional expression [ a document retrieval ] 
as document-retrieval equipment to be invented [ of this invention / according to claim 12 ], It has the 
document-retrieval section which searches a document from an object document set according to the inputted 
reference conditions, the aforementioned reference condition input section It has operation of inputting the 
related keyword which has been sent from related keyword automatic-extracting equipment in addition to a 
user inputting reference conditions as reference conditions. 

[0020] As opposed to the document set from which statistical information, such as the frequency of 
occurrence of the group of the word to which invention of this invention according to claim 13 appears in each 
document of an object document set, using a dictionary as a document-retrieval system, or a word, and a 
distribution, is extracted beforehand The reference condition input section which inputs conditional expression 
required for a document retrieval, and the document-retrieval section which searches a document from an 
object document set according to the inputted reference conditions, The document-retrieval equipment which 
has the document ranking section which calculates the goodness of fit between the reference formulas and 
documents which were inputted, and changes about the document searched in the document-retrieval section, 
It has related keyword automatic-extracting equipment connected to the aforementioned document-retrieval 
equipment. It has operation of sending the ranking result outputted from the document ranking section of the 
aforementioned document-retrieval equipment to related keyword automatic-extracting equipment, and feeding 
back a related keyword to the reference condition input section of related keyword automatic-extracting 
equipment to document-retrieval equipment, and performing retrieval by keyword. 

[0021] Invention of this invention according to claim 14 is set to a document-retrieval system according to 
claim 13. The document set selection section is prepared between document-retrieval equipment and related 
keyword automatic-extracting equipment. The ranking result outputted from the document ranking section of 
document-retrieval equipment is sent to the document set selection section, specification of a document is 
performed, and the subset of the document which the document set selection section 47 specified is inputted 
into the aforementioned related keyword automatic-extracting equipment 48. 

[0022] In a document-retrieval system according to claim 13 or 14, as for invention of this invention according 
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to claim 15, related keyword automatic-extracting equipment according to claim 1 to 10 is used for related 
keyword automatic-extracting equipment. • - 

[0023] The reference condition input sfection which inputs conditional expression [ a document retrieval ] to be 
invented [ of this invention / according to claim 16 ], The document-retrieval equipment which has the 
document-retrieval section which searches a document and consists of an object document set according to 
the inputted reference conditions, It has related keyword automatic-extracting equipment connected to the 
aforementioned document-retrieval equipment, the reference condition input section of the aforementioned 
document-retrieval equipment It has operation of inputting the related keyword which has been sent from 
related keyword automatic-extracting equipment in addition to a user inputting reference conditions as 
reference conditions, and performing retrieval by keyword. 

[0024] In a document-retrieval system according to claim 16, as for invention of this invention according to 
claim 17, related keyword automatic-extracting equipment according to claim 1 to 10 is used for related 
keyword automatic-extracting equipment. 

[0025] Below, the form of concrete operation of this invention is explained with reference to an attached 
drawing. 

[0026] (Form 1 of operation) The form of operation of the 1st of this invention is explained to the beginning. 
Drawin g 1 is the block diagram having shown the composition of the related keyword automatic-extracting 
equipment concerning the form of operation of the 1 st of this invention. First, the statistical information 
extraction section 13 which operates as pretreatment extracts the word statistical information 14, such as a 
frequency distribution of the word in the whole document set, and the word statistical information 1 5 in a 
document which is the statistical information of the word contained in the document concerned for every 
document to the target document set 1 1 using the dictionary 1 2. Drawin g 2 (a) is the table format view showing 
the structure of word statistical information, and drawin g 2 (b) is the table format view showing the structure of 
the word statistical information in a document. The word statistical information 14 is stored as a table as 
shows the statistical information of the word extracted by the statistical information extraction section 13 to 
drawing 2 (a). By using this table, it can ask for the full force present frequency and the number of appearance 
documents of a word "the Internet" whole sentence in the letter at high speed. Moreover, the word statistical 
information 15 in a document is stored as a table as shows the statistical information of the word for every 
document to drawing 2 (b). Thereby, a word "the Internet" can ask a publication number 0010 for the statistical 
information for every document that a word "WWW" appears twice, at high speed 5 times. 
[0027] Related keyword automatic-extracting equipment 1 6 consists of the word statistical information 
Management Department 17 which manages the word statistical information 14 of the whole document, and the 
word statistical information 1 5 in a document, the word ranking section 1 8 which computes the significance of a 
word, the document set selection section 19 which specifies the subset of an object document, and the 
condition input section 20 which inputs the selection conditions to the document set selection section 19 and 
which is a means. 

[0028] Operation of related keyword automatic-extracting equipment 16 which has this composition is 
explained below. First, the document set selection section 1 9 specifies a document set according to the 
conditions inputted to the condition input section 20. A document set is specified with either of three kinds of 
meanses as follows, or its combination. 

(1) Specify a document set according to the attribute of a document. In this case, the genre to which a 
document belongs has a means to choose a document by the attribute value beforehand given to the 
document, and the document set selection section 1 9 adopts as a subset the document group corresponding to 
the attribute value specified by the condition input section 20. 

(2) Specify a document set by the reference formula. In this case, the document set selection section 1 9 has a 
document-retrieval means to specify the document which suits the reference formula inputted in the condition 
input section 20, and adopts as a subset the document group obtained using this as a result of reference. In 
addition, if there is a function which judges a goodness of fit with a reference formula for a document-retrieval 
means, and carries out ranking of the document to the order of a goodness of fit in that case, you may adopt 
the particular part of the reference results, for example, high order 10 document, as a subset. 

(3) The document set specified by the user. In this case, the document set selection section 1 9 adopts as a 
subset the document (plurality) which the user specified directly in the condition input section 20. 

[0029] The document set selection section 19 passes the word statistical information Management Department 
1 7 the document set selected by the above as the set of publication numbers of the identifier which 
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determines each document as a meaning, for example, a list. To the specified document set, the word 
statistical information Management Department *1 7 investigates the word statistical information 14 in a 
document from a publication number for every document, and gets the frequency of occurrence in the word 
which appears in the document concerned, and each document. Next, the word statistical information 1 5 is 
investigated about all the obtained words, and the frequency and distribution information in a whole sentence 
document on the word concerned are acquired. 

[0030] The various statistical information obtained here is passed to the word ranking section 18, and the 
significance of each word is computed. Significance [ of a certain word "W" ] S (W) is computable as follows, 
for example. 
[Equation 1] 

S (W) =C*Z {TF j (W) * IDF (W) } *FN (W) 

However, C : Constant n : The number TFj of documents contained in the specified document set (W): 

Document Dj The frequency of occurrence FN of the word "W" which can be set (W) : It is the number of 

documents which are document [ which was specified ] gathering and contains the word "W." 

[0031] Moreover, IDF (W) is an index called idf value of the word "W", for example, is calculated by the 

following formulas. 

IDF(W)= 1-log(DF(W)/N) 

However, DF (W): The number N of documents in which the word "W" appears in the whole document : It is the 
number of whole sentence documents. 

[0032] As for IDF (W), the value becomes small when the word "W" appears in more documents (that is, it is a 
more general word). Thereby, the significance of the word which appears comparatively well in the whole object 
document can be suppressed low. the significance of the word which appears in the specified document set 
mostly by furthermore taking FN (W) into consideration — high — it can do — a result — a significance high 
into a word characteristic of the specific document set — it can give . In addition, in the above-mentioned 
computing method, you may normalize TF (W) by document sizes (the character number, the number of 
differences of a word contained) of a document, full force present frequency of a word, etc. in which the word 
is contained. 

[0033] The word ranking section 18 performs significance calculation about all the words contained in the 
whole sentence document in the specified subset, and aligns all words in order of significance after that. At the 
end, a particular part, for example, high order 10 word, is adopted from the word group which aligned, and it 
shows as a group of a word or a word, and its significance. In addition, you may show simultaneously the 
various statistical information used not only for significance but for significance calculation on the occasion of 
extraction. Moreover, the group of the extracted related keyword and its significance can also be accumulated 
as a user s history. By doing in this way, the large application of becoming possible to express a range, taste, 
etc. of interest of a user as a vector of a keyword and its weight, and using this vector for other operations, for 
example, reference of a document set, etc. is possible. 

[0034] If the above formula is used, it can carry out like the example shown, for example in drawin g 3 , and 
related keyword automatic extracting can be performed. This drawin g 3 is drawing showing the flow of the 
procedure of related keyword automatic-extracting operation. In drawin g 3 , the word statistical information 
Management Department 17 by which the publication-number list 31 was inputted outputs the word which 
appears in the corresponding publication number (for example, 0010, 0341 grades), and its frequency for every 
document, and gets the word statistical information 33, 34, and 35 in a document. Simultaneously, the 
statistical information 32 whole sentence in the letter is obtained to all the words called for here. Next, such 
statistical information 32, 33, 34, and 35 is passed to the word ranking section 1 8. In the word ranking section 
18, the significance of each word is calculated based on the various statistical information 32-35 using the 
aforementioned formula. It is as follows when it is the case of drawin g 3 (however, C is set to 1 and N is set to 
10000). 

IDF(applet) = Hog (86/10000) 

= 5.756 S (applet) = 2*5.756+6*5.756*2 = 92.096 IDF(Internet) = Hog (1 129/10000) 

= 3.181 S(Internet) = (3*3.181+1*3.181+2*3.181) *3 = 57.258 IDF (CGI) = 1Hog (79/10000) 



http://www4.ipdl jpo.gojp/cgi-bin/tran_web_cgi_ejje 



2003/12/18 



6/12 ^— v 



= 5.840 S(CGI) =(4*5.756)* 1 = 23.024 IDF(WWW) = 1-log(61 5/1 0000) 
= 3.789 S(WWW) =(5*3.789)*1 = 18.945 IDF(JAVA) = 1^log(1 61/10000) 
= 5.1 29 6 S(JAVA) =(6*5.1 29+3*5.1 29+3*5. " 
129)*3 = 184.644 IDF(SUN) = Hog(35/10000) 

= 6.655 S (SUN) =(6*6.655) *1 = 39.930 IDF(script) = Hog (813/10000) 

= 3.510 S (script) =(5*3.510) *1 = 17.550 [0035] In the word ranking section 18, a word is aligned with the 
significance searched for as mentioned above, and the word list 37 of [ after alignment ] is obtained. Here, if it 
has been specification that three high orders of the word by which ranking was carried out are extracted, 
"JAVA" which is three high orders in the word list 37, an "applet", and the "Internet" are extracted as a 
related keyword. 

[0036] Although it came above as an object of extraction of one word registered into the dictionary, generally 
the group of not only a word but a word is sufficient. The group of a word points out the group of the 
compound constituted by continuation of a noun, and the noun connected with particle "", the group of the 
noun connected with a particle "**" and "**", and a verb, etc. If such statistical information can be extracting 
in advance like a word, the technique shown above can apply as it is, and the group of a word can be extracted 
as a related keyword. 

[0037] In addition, the related keyword input unit 1 6 is good also considering the document set selection 
section 19 and the condition input section 20 as another composition. When the document set selection 
section 19 has a document-retrieval means by the reference formula especially, the publication number by 
document-retrieval equipment can be received as an input, and the related keyword outputted can be made to 
reflect in the reference formula input section of document-retrieval equipment by considering as another 
composition as shown in later dra wi n g 7 . 

[0038] Thus, when the subset of the document which is a part of the target documents is specified according 
to the form of this operation, By extracting the part of the word groups which calculated significance, aligned in 
order of significance, and aligned about each of each word which appears in each document contained in the 
subset concerned, and considering as a related keyword It has the effect that it can ask for the related 
keyword based on the property of the target document dynamically and at high speed. 

[0039] Moreover, it can use the related keyword obtained as mentioned above as an input to the document- 
retrieval equipment for the same document, and the exact keyword which suited the property of an object 
document in that case is not only reusable, but since surely being contained in an object document is 
guaranteed, the related keyword concerned has the effect that a reference result is surely obtained, when it 
searches using this. 

[0040] The obtained related keyword can be used as an input to the document-retrieval equipment for the 
same object document set or another object document set. moreover, in that case In the document set set as 
the object of related keyword extraction based on a characteristic keyword The same or another document set 
can be searched and it has the effect that it is applicable also to the document set with a property which is 
different in the keyword concerned in the case of the document-retrieval equipment which makes another 
document set applicable to reference especially. 

[0041] Moreover, it has the effect that it can use easily also in a user unfamiliar to operation of reference at 
the same time it becomes possible to choose a related keyword by simple operations, such as the click of a 
mouse, it mitigates the operation in re-reference and it raises the efficiency of reference instead of inputting 
reference conditions again from a keyboard by considering as the composition of making a user present and 
choose the extracted keyword, in case a user performs re- reference. 

[0042] Moreover, if it is document-retrieval equipment which can give weight to each word in a reference 
condition, it has the effect that a highly precise reference result can obtain by considering the extracted 
keyword and its significance as an input as it is, in the document-retrieval equipment which calculates a 
goodness of fit for example, with reference conditions, and carries out the ranking of the document by adding 
and showing the extracted related keyword the significance. 

[0043] Moreover, by accumulating the group of the extracted related keyword and its significance as a user s 
history, it becomes possible to express a range, taste, etc. of interest of a user as a vector of a keyword and 
its weight, and also has the effect that the large application of using this vector for reference of other 
document sets etc. is possible. 

[0044] (Form 2 of operation) Next, the form of operation of the 2nd of this invention is explained using the 
same drawin g 1 as the block diagram shown in the form 1 of operation. With the form of this 2nd operation, the 



http://www4.ipdl jpo.go.jp/cgi-bin/tran_web_cgLejje 



2003/12/18 



7/12^— v 



document set selection section 19 specifies two kinds of document sets A, and the document set B. Here, the 
document set B is the subset of the document set A. for example, the dociynent set A obtained as a result of 
referring to a certain reference formula 1 — among those, they are the case where the document set B which 
the user specified as a related document group is specified, the case where the document set A specified 
according to the attribute of a document and the document set B further narrowed down by the reference 
formula in it are specified, etc. 

[0045] The significance of a word is computed by carrying out the multiplication of the distribution index of the 
word computed by the following formulas in this case to the significance of the word concerned etc. 
DI(A,B,W)= {(NA/DA(W))*(DB(W)/NB)} 

However, the total number NB of documents of the number NA[ of documents ]:subset A in which the word 

"W" in the number DB[ of documents ] (W):subset B, in which the word "W" in the DA(W):subset A appears 

appears: The total number of documents of Subset B [0046] This appears by high frequency in Subset B, and 

serves as a value with what [ higher ] has the lower frequency of occurrence in Subset A. The word which 

serves as a high value in an upper formula contributes to the discrimination nature of Subset B greatly in 

Subset A, and it can be said that it is the keyword by which Subset B is characterized more. For example, in 

the example shown in drawing 3 , it supposes that the publication-number list 31 is Subset B, and suppose that 

it is the subset A containing this (it considers as the total 100 documents) as the number of appearance 

documents of each word in Subset A being the following by the case where it is specified simultaneously. 

DA (applet) = 10DA(Intemet) = 28DA (CGI) = 9DA (WWW) = 14DA (JAVA) = 20DA (SUN) = 5DA (script) = 10 

[0047] In this case, the significance S2 of each word (W) serves as a value which carried out the multiplication 

of the weight DI (A, B, W) of each word to significance [ of each word explained with the form 1 of operation ] 

S (W), and is calculated as follows. 

S2 (applet) = 92.096 * {(100/10 )* (2/3)} 

= 613.973 S2(Intemet) = 57.258 * {(100/28 )* (3/3)} 

= 204.493 S2(CGI) = 23.024 * {(100/9)*(1/3)} 

= 85.274 S2(WWW) = 18.945 * {(1 00/1 4)*( 1/3)} 

= 45.107 S2(JAVA) = 184.644 * {(100/20)*(3/3)} 

= 923.220 S2(SUN) = 39.930 * {(100/5)*(1/3)} 

= 266.200 S2 (script) = 17.550 * {(100/10 )* (1/3)} 

= It becomes the order of S2 (CGI) = 85.274 S2 (script) = 58.500 S2 (WWW) = 45.107. If it is set to 58.500 and 
aligns in order of significance S2 (JAVA) = 923.220 S2 (applet) = 613.973 S2 (SUN) = 266.200 S2(Internet) = 
204.493 Therefore, if three high orders are extracted as a related keyword, "JAVA", an "applet", and "SUN" 
will serve as a related keyword. 

[0048] The above-mentioned formula is an example and may use other formulas from which it appears by high 
frequency in Subset B, and what has the low frequency of occurrence in Subset A serves as a high value. 
[0049] Thus, according to the form of this operation, it has the effect that a highly precise related keyword can 
be obtained, by taking into consideration the difference in the frequency distribution between two kinds of 
specified subsets. 

[0050] (Form 3 of operation) Next, the form of operation of the 3rd of this invention is explained using the 
same drawin g 1 as the block diagram shown in the form 1 of operation. With the form of this 3rd operation, the 
function which gives the weight of each document to the document set selection section 1 9 is prepared. For 
example, when a user specifies a document and five steps of evaluation values are given by making degree of 
association into an index to each document, or when ranking of the document obtained as a result of reference 
by the reference formula is carried out by the goodness of fit with a reference formula, it is the case where the 
weight which was said to the 1st place as ten points, and was said to the 2nd place as nine points is given etc. 
As opposed to the word contained in the document concerned, the multiplication of the weight given to each 
document is carried out, and the word ranking section considers it, and performs significance calculation. In 
addition, the weight given to each document may be a negative value. For example, in case a user specifies a 
document, a related document is also allowed weight grant of giving -one point to the document which is not 
related at all two points. Significance of the word included also in the document also irrelevant to a related 
document by this (and it is not so general) can be made low. 

[0051] Thus, it has the effect that the highly precise related keyword which took the significance of each 
document into consideration is obtained by considering as a formula from which the word contained in a more 
important document serves as a high significance, by giving weight to each document contained in the specified 
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document set according to the gestalt of this operation. 

[0052] (Gestalt 4 of operation) Next, the gestalt* of operation of the 4th of this invention is explained. Drawin g 4 
is the block diagram of the related keyword automatic-extracting equipment concerning the gestalt of operation 
of the 4th of this invention. In addition to the 1 st composition of the gestalt of operation, it has the threshold 
setting section 22, and changes, and transmission and reception of data have come to be able to do this 
threshold setting section among the word statistical information Management Department 17 with the gestalt of 
this 4th operation. Moreover, the word exclusion function by the threshold is given to the word statistical 
information Management Department 1 7 in the gestalt of this operation. In this composition, in case the word 
statistical information Management Department 1 7 outputs the statistical information of each word, with 
reference to the threshold setup 22 defined beforehand, the word of extremely high frequency or low frequency 
can consider it as the composition which excepts from a candidate on that spot and does not output the 
information on the word concerned to the word ranking section 1 8. For example, by setting up a threshold 1 
with "the word which appears in 50% or more of a whole sentence document", and setting up a threshold 2 with 
"the word which appears only in one document", the bad influence which these words have on significance 
calculation can be prevented in advance, and improvement in the speed of processing can be attained. 
[0053] In addition, according to characteristic quantity of the word concerned, such as the length of a word, 
you may set a threshold as several kinds in that case. For example, it is performing a threshold setup "the 
word of 50% or more of the whole and a single character being 30% or more of the whole for the word of two or 
more characters" by the case of Japanese, and the range of the word excepted in accordance with the 
property of each word is set up. 

[0054] Thus, according to the form of this operation, by excepting the word whose appearance degree is high 
frequency or low frequency in the whole object document set in consideration of the threshold which was able 
to be defined beforehand, keyword extraction processing can be accelerated and it has the effect that only a 
word with high effectiveness can be sorted out in the case of reuse. 

[0055] (Form 5 of operation) Next, the form of operation of the 5th of this invention is explained. Drawing 5 is 
the block diagram showing the composition of the related keyword automatic-extracting equipment concerning 
the form of operation of the 5th of this invention. The related keyword automatic-extracting equipment 
concerning the form of this 5th operation As [ explained / in the form of the 1 st operation ] The word 
statistical information 14 of the whole document, and the word statistical information 15 in a document It adds 
to the basic composition which has the word statistical information Management Department 1 7 which 
manages, the word ranking section 18, the document set selection section 19 which specifies the subset of an 
object document, and the condition input section 20 which is a selection condition input means to the 
document set selection section 1 9. It aims at raising the quality of the related keyword group extracted by the 
word ranking section's 18 being interlocked with and using various information, such as the attribute of a word. 
In drawin g 5 , as for the appearance Research and Data Processing Department and 26, the word attribute 
Research and Data Processing Department and 27 are the character string inclusion relation judging sections, 
these function parts are contained in related keyword automatic-extracting equipment 29, and a sign 25 is 
interlocked with the word ranking section. Moreover, 28 is the representation word grant section and this 
representation word grant section 28 outputs a related keyword in response to data from the word ranking 
section 18. Moreover, the word appearance positional information extraction section 23 which extracts the 
information on a position that a word appears based on the data from the object document set 1 1 , as an 
external function part is formed to related keyword automatic-extracting equipment 29, and the appearance 
positional information 24 is outputted from this word appearance positional information extraction section 23. 
This appearance information is sent to the appearance Research and Data Processing Department 25. 
[0056] The operation is explained about the gestalt of the operation of the 5th of this invention which has this 
composition. In operation of the gestalt of this operation, the statistical information extraction section 1 3 which 
operates as pretreatment to the target document set 1 1 using a dictionary 12 first extracts the word statistical 
information 14, such as the frequency of occurrence, a distribution, etc. of the word in the object document set 
1 1 whole, and the word statistical information 15 in a document which is the statistical information of the word 
contained in the document concerned for every document. Simultaneously, if there is need, the word positional 
information extraction section 23 will also extract the appearance positional information 24 of a word. Drawin g 
6 is a table format view showing an example of the data structure of the appearance positional information 24 
extracted by the word appearance positional information extraction section 23. Appearance positional 
information is stored as a table as shown in drawin g 6 . The word which appears in the document for every 
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document, an appearance position (for example, byte offset from the head of a document), an appearance 
partition, etc. are stored. ■ , 

[0057] And on the occasion of related keyword automatic-extracting operation, the appearance Research and 
Data Processing Department 25 is asked to each word, information, such as an appearance position of the 
word concerned and the appearance context, is acquired, and significance calculation is seasoned with this. For 
example, significance is computed by the technique of carrying out the multiplication of the "weight" like one 
point to the significance of each word, when are contained in a title by in any the word concerned is contained 
among these elements when all the documents made applicable to reference are documents which consist of 
elements, such as a title (or header), a subtitle, and the text, it is contained in a three-point subtitle by it and it 
be contained in the two-point text. 

[0058] Or you may use the information on an appearance position. For example, it is also possible to compute 
significance by the technique of carrying out the multiplication of the "weight" like one point to the significance 
of the word concerned, if the number of characters between the word contained in a reference formula when 
the word contained in this reference formula can be referred to by the case where a subset is specified by the 
reference formula, and the word set as the object of the present significance calculation is less than two 
characters, it is three characters [ less than ten ] and it is two characters. 

[0059] Moreover, as another mode of the form of this operation, to each word, the word attribute Research and 
Data Processing Department 26 is asked, and the part of speech of the word concerned, a classification, etc. 
acquire the attribute of the word, and season significance calculation with this. For example, it is also possible 
to compute significance paying attention to the part of speech of the word concerned, by the technique of in 
addition to this not being an independent word one point of carrying out the multiplication of the "weight" like 
zero point to the significance of each word if it becomes things (a particle, auxiliary verb, etc.), if it is a proper 
noun, it is a five-point common noun, it is a four-point adjective and an adjective verb and it is a two-point 
verb and an adverb. 

[0060] Moreover, it judges whether inclusion relation is between one word in the words extracted as another 
mode of the gestalt of this operation using the character string inclusion relation judging section 27 which 
judges the inclusion relation as a character string between some two words, or the word group specified 
beforehand, and the extracted word, and the word to extract is restricted when judged with there being 
inclusion relation. The word group specified beforehand here is a word contained in the reference formula at 
the time of using a reference formula for specification of a subset. In the judgment of inclusion relation, the 
case where any one of the following criteria (one [ or ] or more) is filled can be recognized as inclusion relation 
by setup defined beforehand. 

The word "A", and the word "B" are in agreement in the front the word "A" (1) When shorter than the word 
"B", (2) The word "A", and the word "B" are in agreement in back, the word "A" When shorter than the word 
"B", (3) It is [0061] when completely in agreement [ the word "A" is the portion of the word "B", and when the 
front and back are not in agreement, the relation between (4) words "A", and the word "B" fills either of (1) - 
(3), and ] with the component of the word "B." For example, on the criteria of (1), "Tokyo" to "Tokyo" is 
judged to be a code word. Hereafter, on the criteria of (2), "gratitude" of as opposed to "large Thanksgiving 
Day" in "sale" to "new sale" is similarly judged by the criteria of (3), respectively to be a code word. If the 
criteria of (4) are important in the case of the code-word judging in English and these criteria are followed 
"artificial intelligence" It receives, "art" "tell" A code word is "artificail" although it does not become, 
"intelligence" It is judged with a code word. 

[0062] One of the following criteria (set up beforehand) is followed [ words / judged that have a code-word 
relation by the above-mentioned criteria / two ] also about which / the / is adopted as a related keyword. 
(1) Adopt the word of a long unit (2). The word of a short unit is adopted (3). A word with a high significance is 
adopted (4). [0063] which adopts the difference of the word of a short unit and the word of a long unit, and the 
word of a short unit For example, a word "Tokyo" is extracted with significance 10 and a word "Tokyo" is 
extracted with significance 7, respectively. And when a code-word relation is materialized to both, if the criteria 
of (1) are followed, "Tokyo" long as a character string will be adopted, if the criteria of (2) are followed, 
"Tokyo" short as a character string will be adopted, and when the criteria of (3) are followed, "Tokyo" where 
significance is more high will be adopted, the criteria of (4) — for example, word "artificial intelligence" 
"artificial" the case where a code-word relation is materialized in between — "artificial" and — "intelligence" 
It adopts as a related keyword and is mainly effective in an English document. 

[0064] When it is the word in which a code-word relation is materialized between the word groups specified 
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beforehand, technique other than (3) can be used. In this case, it becomes the processing "it will not adopt as 
a related keyword if it is a short unit (or long unit)/' Arty technique can be used when a code-word relation is 
materialized in the extracted words. 

[0065] Moreover, the extracted related keyword group is classified and shown as another mode of the form of 
this operation using the attribute and statistical information of each word, if a part of speech is used as an 
attribute of a word, other than this, a proper noun can be resembled, for example, and it can classify, and can 
show Or it is also possible to use a thesaurus dictionary as an attribute of a word, to classify each word 
according to the form corresponding to the classification in a thesaurus, and to show it. Moreover, with the 
classification using statistical information, the technique classified according to the number of appearance 
documents of each word in the document set specified, for example is raised. The effect of narrowing down at 
the time of the word being used for re-reference can be checked in advance by classifying according to the 
criteria "whether the number of appearance documents is 80 percent or more of a document set" in that case. 
In addition, when using a thesaurus dictionary as an attribute of a word in a classification, it is also possible to 
give the word equivalent to the host node of a thesaurus as a representation word to the classified word group, 
and to represent a word group with the word. Similarly, when using the statistical information 14 of a word, in 
the classified word group, you may adopt a word with the highest frequency of occurrence as a representation 
word. 

[0066] Thus, according to the gestalt of this operation, the related keyword in consideration of the information 
on the structure of a document or the distance between words can be extracted by using the information on a 
position that the word appeared, and it has the effect that highly precise related keyword extraction becomes 
possible. 

[0067] Moreover, the part of speech of a word etc. can extract the related keyword according to the feature of 
each attribute by taking into consideration the attribute information on each word, and it has the effect that 
highly precise related keyword extraction becomes possible. 

[0068] Moreover, by taking into consideration the inclusion relation as a character string between words, the 
word which are the same meaning and a use is eliminated, a related keyword can be extracted, and it has the 
effect that the redundancy as the whole related keyword can be suppressed. 

[0069] Moreover, the extracted related keyword is classified, by setting up the representation word 
corresponding to each classification, if there is need, the list nature of the extracted keyword, an inclination, 
the effectiveness in the case of reuse, etc. are checked beforehand, a related keyword can be extracted, and it 
has the effect that the facility as a related keyword can be improved. 

[0070] (Form 6 of operation) Next, the form of operation of the 6th of this invention is explained. Drawin g 7 is 
the block diagram showing the document-retrieval structure of a system realized combining the composition of 
document-retrieval equipment and this concerning the form of operation of the 6th of this invention, and 
related keyword automatic-extracting equipment. This document-retrieval equipment 41 cooperates with the 
related keyword automatic-extracting equipment concerning the form of the above 1st, the 2nd, the 3rd, the 
4th, or the 5th operation, and operates. 

[0071] The document-retrieval equipment 41 in this operation form has the document ranking section 46 which 
calculates the goodness of fit between the reference condition input section 44 which inputs conditional 
expression required for a document retrieval, the document-retrieval section 45 which searches a document 
according to the inputted reference conditions, and the reference formula and document which were inputted 
about the document searched in the document-retrieval section 45, and changes. This document-retrieval 
equipment 41 makes applicable to reference the same object document set 11 as the related keyword 
automatic-extracting equipment 48 which cooperates and operates, and searches using the index 43 for 
document retrievals beforehand created by the index generation section 42 using the same dictionary 12 as 
using for word statistical information extraction. Moreover, the related keyword automatic-extracting 
equipment 48 in this operation form considers the document set selection section 47 as another composition, 
and sets (list of publication numbers which are meaning etc.) of the identifier of the document corresponding to 
each element of the subset of the document which the document set selection section 47 specified are 
inputted into related keyword automatic-extracting equipment 48. 

[0072] The operation is explained about the form of this operation equipped with the above composition. Based 
on the reference conditions first inputted into the reference condition input section 44, the document with 
which the document-retrieval section 45 suits reference conditions with reference to the index 43 for 
reference is specified. You may make what calculated the goodness of fit between the reference formulas and 
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documents which were inputted [ in / the document ranking section 46 / further ] although it was good also as 
a reference result document 50 as it is in the documertt set obtained here, „ and aligned the document in order 
with a high goodness of fit the composition of considering as a reference result. In this way, the document set 
50 of the obtained reference result is passed to the document set selection section 47 as soon as it returns to 
a user as a reference result. In the document selection section 47, all or a part of document set passed from 
the document ranking section 46 is adopted as an input to related keyword automatic-extracting equipment 48. 
As long as ranking of the document is carried out to the order of a goodness of fit, you may make it the 
composition of selecting high order 10 document among document sets of a reference result. Moreover, if the 
attribute information beforehand given for every document can be used, it is good also as composition of 
selecting only the document which has specific attribute value using this. 

[0073] The subset of the document specified by the document set selection section 47 is sent to related 
keyword automatic-extracting equipment 48, and extracts the related keyword group 49 in a procedure as 
shown in the form of the above 1st, the 2nd, the 3rd, the 4th, or the 5th operation. In this way, the obtained 
related keyword group 49 is returned to the reference condition input section 44, and a user is shown it. A user 
can choose a required thing from the shown related keyword group, can consider as new reference conditions, 
and can perform reference again. According to the form of this operation, the related keyword obtained as 
mentioned above by related keyword automatic-extracting equipment by this It can use as an input to the 
document-retrieval equipment for the same document. In this case, the exact keyword which suited the 
property of an object document is not only reusable, but since surely being contained in an object document is 
guaranteed, the related keyword concerned has the effect that a reference result is surely obtained, when it 
searches using this. 

[0074] (Gestalt 7 of operation) Next, the gestalt of operation of the 7th of this invention is explained. Drawin g 8 
is the block diagram showing the document-retrieval structure of a system realized combining the composition 
of document-retrieval equipment and this concerning the gestalt of operation of the 7th of this invention, and 
related keyword automatic-extracting equipment. This document-retrieval equipment 51 cooperates with the 
document-retrieval equipment 41 concerning the gestalt of the 6th operation, and the related keyword 
automatic-extracting equipment similarly applied to the gestalt of the above 1st, the 2nd, the 3rd, the 4th, or 
the 5th operation, and operates. 

[0075] The document-retrieval equipment 51 in this operation form has the reference condition input section 

54 which inputs conditional expression required for a document retrieval, and the document-retrieval section 

55 which searches a document according to the inputted reference conditions, and changes. The document- 
retrieval equipment 51 in this operation form makes applicable to reference object document set 56 which is 
different in the related keyword automatic-extracting equipment 52 which cooperates and operates, and has 
the composition that the document-retrieval section 55 is connected to the object document set 56. In 
addition, the detail about the reference technique is not asked. 

[0076] Operation in this operation form equipped with the above composition is explained below. According to 
the conditions specified first, related keyword automatic-extracting equipment 52 operates, and the related 
keyword group 53 is outputted. The reference condition input section 54 in document-retrieval equipment 51 is 
shown to a user by considering the related keyword group 53 as an input, and a user can choose only a 
required thing among the shown related keywords, and can perform reference to the object document set 56 
used as the candidate for reference, and it can obtain the reference result document 57. 
[0077] Thus, according to the form of this operation, the related keyword obtained by related keyword 
automatic-extracting equipment 52 can be used as an input to the document-retrieval equipment 51 for the 
same object document set or another object document set. In the document set set as the object of related 
keyword extraction in this case, based on a characteristic keyword The same or another document set can be 
searched and it has the effect that it is applicable also to the document set with a property which is different 
in the keyword concerned in the case of the document-retrieval equipment which makes another document set 
applicable to reference especially. 
[0078] 

[Effect of the Invention] The document set selection section which specifies the subset of a document for 
related keyword automatic-extracting equipment according to this invention as explained above, the word 
which appears for the whole object document or each document of every, the word statistical information 
Management Department which manages the statistical information, and the word ranking section which 
computes the significance of each word which appears in the subset of a document, and aligns in order of 



http:/ / www4.ipdl.jpo.gojp/cgi-bin/tran_web_cgi_ejje 



2003/12/18 



12/12 ^— v 



significance — **, since it was alike and constituted more It is possible to ask for the whole document and the 
statistical information of each word in the specified dodument subset at high speed, ranking of each word which 
appears in the specified document set ban be carried out at high speed based on the significance, and the part 
can be shown as a related keyword. 

[0079] Moreover, the precision as a related term of the word group extracted can be raised by deleting the 
word which the weight of a word is changed or fulfills specific conditions from the word group after ranking by 
establishing a means to manage the appearance position in the attribute information on a word, or a document 
etc. in addition to the aforementioned composition. Moreover, more intelligible related keyword presentation can 
be performed by classifying the extracted word group according to the attribute and statistical property of a 
word. 

[0080] Furthermore, by constituting the document-retrieval system containing the document-retrieval 
equipment which cooperated with related keyword automatic-extracting equipment, and reusing the extracted 
related keyword as an input If the extracted related keyword suits the property of an object document and the 
candidate for reference is the same document group, since it is secured that at least one or more reference 
results are obtained by the keyword, The effect of being able to perform re-reference efficiently and easily is 
acquired. 



[Translation done.] 
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